A generalized parsing framework for Abstract Grammars

نویسندگان

  • Daniel Harasim
  • Chris Bruno
  • Eva Portelance
  • Timothy J. O'Donnell
چکیده

This technical report presents a general framework for parsing a variety of grammar formalisms. We develop a grammar formalism, called an Abstract Grammar, which is general enough to represent grammars at many levels of the hierarchy, including Context Free Grammars (CFGs), Minimalist Grammars (MG), and other weakly MG-equivalent languages like Linear Context-Free Rewriting Systems. We then develop a single parsing framework which is capable of parsing grammars which are at least up to MGs on the hierarchy. Our parsing framework exposes a grammar interface modelled on the Abstract Grammar formalism, so that it can parse any particular grammar formalism that can be reduced to an Abstract Grammar. There is a great deal of previous work that is capable of parsing the grammars we treat here. All of the grammars mentioned here have parsers written specifically for them and there are frameworks more general than the one given here, such as probabilistic programming languages that can specify arbitrary probabilistic programs. Parsers for specific grammars are able to exploit optimizations specific to the task and formalism they were designed for and are therefore often faster than general systems. However, these parsers have the disadvantage that they cannot be used for formalisms other than the one that they were intended for, making it difficult to prototype and compare different formalisms. Our framework is the middle ground between these two approaches. We aim to be general enough to parse a variety of interesting formal grammars, while also exploiting optimizations specific to the parsing task. In the following, we define Abstract Grammars as a generalization of Context Free Grammars where (i) the rewrite rules are partial functions and (ii) the set of nonterminals is paired with a set of operations to form a heterogeneous algebra. We first define Abstract Context-free Grammars in §2 which incorporate property (i), and then define the fully general Abstract Grammars in §3 which incorporate property (ii). By (i), generalizing the rewrite rules to any partial function, we can group related CFG rewrite rules into a common function, allowing those rules to share probability mass. This is useful for representing a musical syntax where the the rules of prolongation and preparation, for example, are independent from the key (Rohrmeier & Neuwirth, 2015; Rohrmeier, 2011; Lerdahl & Jackendoff, 1985). By (ii) generalizing the nonterminals to elements of a heterogeneous algebra, we can represent languages higher than context-free on the hierarchy. This is important for representing natural language, which occupies the space of mildly context-sensitive languages (Shieber, 1985; Joshi, 1985). In §2, after defining Abstract Context-free Grammars, we present a reduction automaton upon abstract CFGs, which is used to state an abstract grammar interface. We then describe the parsing algorithm, including code fragments of our Julia1 implemetation. §3 defines Abstract Grammars and revises the reduction automaton and the grammar interface given in §2 for the fully general case. In §4, we show how to state specific interface functions for a Minimalist Grammar, and an implementation of this interface in Julia. ∗Corresponding author: Daniel Harasim [email protected]. 1julialang.org, see also Bezanson et al. (2017)

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ADP on Trees and Forests General Reforestation: Parsing Trees and Forests Efficient Dynamic Programming on Tree-like Data Structures with ADPfusion

Where string grammars describe how to generate and parse strings, tree grammars describe how to generate and parse trees. We show how to extend generalized algebraic dynamic programming to tree grammars. The resulting dynamic programming algorithms are efficient and provide the complete feature set available to string grammars, including automatic generation of outside parsers, algebra products...

متن کامل

Strictness Analysis for Attribute Grammars

Attribute grammars may be seen as a (rather specialised) lazy or demand-driven programming language. The “programs” in this language take text or parse trees as input and return values of the synthesised attributes to the root as output. From this observation we establish a framework for abstract interpretation of attribute grammars. The framework is used to construct a strictness analysis for ...

متن کامل

Faster Generalized LR Parsing

Tomita devised a method of generalized LR (GLR) parsing to parse ambiguous grammars e ciently. A GLR parser uses linear-time LR parsing techniques as long as possible, falling back on more expensive general techniques when necessary. Much research has addressed speeding up LR parsers. However, we argue that this previous work is not transferable to GLR parsers. Instead, we speed up LR parsers b...

متن کامل

A Generalized View on Parsing and Translation

We present a formal framework that generalizes a variety of monolingual and synchronous grammar formalisms for parsing and translation. Our framework is based on regular tree grammars that describe derivation trees, which are interpreted in arbitrary algebras. We obtain generic parsing algorithms by exploiting closure properties of regular tree languages.

متن کامل

Disambiguation Filters for Scannerless Generalized LR Parsers

Several real-world problems call for more parsing power than is offered by the widely used and well-established deterministic parsing techniques. These techniques also create an artificial divide between lexical and context-free analysis phases, at the cost of significant complexity at their interface. In this paper we present the fusion of generalized LR parsing and scannerless parsing. This c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1710.11301  شماره 

صفحات  -

تاریخ انتشار 2017